Search CORE

231 research outputs found

Exploring Communities in Large Profiled Graphs

Author: Chen Xiaojun
Chen Yankai
Cheng Reynold
Fang Yixiang
Li Yun
Zhang Jie
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Given a graph

G

and a vertex

q\in G

, the community search (CS) problem aims to efficiently find a subgraph of

G

whose vertices are closely related to

q

. Communities are prevalent in social and biological networks, and can be used in product advertisement and social event recommendation. In this paper, we study profiled community search (PCS), where CS is performed on a profiled graph. This is a graph in which each vertex has labels arranged in a hierarchical manner. Extensive experiments show that PCS can identify communities with themes that are common to their vertices, and is more effective than existing CS approaches. As a naive solution for PCS is highly expensive, we have also developed a tree index, which facilitate efficient and online solutions for PCS

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Decouple knowledge from paramters for plug-and-play language modeling

Author: Chen Xiuying
Cheng Xin
Lin Yankai
Yan Rui
Zhao Dongyan
Publication venue
Publication date: 19/05/2023
Field of study

Pre-trained language models(PLM) have made impressive results in various NLP tasks. It has been revealed that one of the key factors to their success is the parameters of these models implicitly learn all kinds of knowledge during pre-training. However, encoding knowledge implicitly in the model parameters has two fundamental drawbacks. First, the knowledge is neither editable nor scalable once the model is trained, which is especially problematic in that knowledge is consistently evolving. Second, it lacks interpretability and prevents humans from understanding which knowledge PLM requires for a certain problem. In this paper, we introduce PlugLM, a pre-training model with differentiable plug-in memory(DPM). The key intuition is to decouple the knowledge storage from model parameters with an editable and scalable key-value memory and leverage knowledge in an explainable manner by knowledge retrieval in the DPM. To justify this design choice, we conduct evaluations in three settings including: (1) domain adaptation. PlugLM obtains 3.95 F1 improvements across four domains on average without any in-domain pre-training. (2) knowledge update. PlugLM could absorb new knowledge in a training-free way after pre-training is done. (3) in-task knowledge learning. PlugLM could be further improved by incorporating training samples into DPM with knowledge prompting.Comment: ACL2023 Finding

arXiv.org e-Print Archive

Measuring and Relieving the Over-smoothing Problem for Graph Neural Networks from the Topological View

Author: Chen Deli
Li Peng
Li Wei
Lin Yankai
Sun Xu
Zhou Jie
Publication venue
Publication date: 18/11/2019
Field of study

Graph Neural Networks (GNNs) have achieved promising performance on a wide range of graph-based tasks. Despite their success, one severe limitation of GNNs is the over-smoothing issue (indistinguishable representations of nodes in different classes). In this work, we present a systematic and quantitative study on the over-smoothing issue of GNNs. First, we introduce two quantitative metrics, MAD and MADGap, to measure the smoothness and over-smoothness of the graph nodes representations, respectively. Then, we verify that smoothing is the nature of GNNs and the critical factor leading to over-smoothness is the low information-to-noise ratio of the message received by the nodes, which is partially determined by the graph topology. Finally, we propose two methods to alleviate the over-smoothing issue from the topological view: (1) MADReg which adds a MADGap-based regularizer to the training objective;(2) AdaGraph which optimizes the graph topology based on the model predictions. Extensive experiments on 7 widely-used graph datasets with 10 typical GNN models show that the two proposed methods are effective for relieving the over-smoothing issue, thus improving the performance of various GNN models.Comment: Accepted by AAAI 2020. This complete version contains the appendi

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Stochastic Bridges as Effective Regularizers for Parameter-Efficient Tuning

Author: Chen Weize
Han Xu
Lin Yankai
Liu Zhiyuan
Sun Maosong
Zhou Jie
Publication venue
Publication date: 28/05/2023
Field of study

Parameter-efficient tuning methods (PETs) have achieved promising results in tuning large pre-trained language models (PLMs). By formalizing frozen PLMs and additional tunable parameters as systems and controls respectively, PETs can be theoretically grounded to optimal control and further viewed as optimizing the terminal cost and running cost in the optimal control literature. Despite the elegance of this theoretical grounding, in practice, existing PETs often ignore the running cost and only optimize the terminal cost, i.e., focus on optimizing the loss function of the output state, regardless of the running cost that depends on the intermediate states. Since it is non-trivial to directly model the intermediate states and design a running cost function, we propose to use latent stochastic bridges to regularize the intermediate states and use the regularization as the running cost of PETs. As the first work to propose regularized PETs that use stochastic bridges as the regularizers (running costs) for the intermediate states, we show the effectiveness and generality of this regularization across different tasks, PLMs and PETs. In view of the great potential and capacity, we believe more sophisticated regularizers can be designed for PETs and better performance can be achieved in the future. The code is released at \url{https://github.com/thunlp/stochastic-bridge-pet/tree/main}.Comment: ACL 2023 Finding

arXiv.org e-Print Archive

Systematic Analysis of Impact of Sampling Regions and Storage Methods on Fecal Gut Microbiome and Metabolome Profiles.

Author: Chang Hang
Chen Minjian
Dong Tianyu
Hang Bo
He Lianping
Liang Yali
Liu Xingyin
Mao Jian-Hua
Snijders Antoine M
Wang Tingzhang
Xia Yankai
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

The contribution of human gastrointestinal (GI) microbiota and metabolites to host health has recently become much clearer. However, many confounding factors can influence the accuracy of gut microbiome and metabolome studies, resulting in inconsistencies in published results. In this study, we systematically investigated the effects of fecal sampling regions and storage and retrieval conditions on gut microbiome and metabolite profiles from three healthy children. Our analysis indicated that compared to homogenized and snap-frozen samples (standard control [SC]), different sampling regions did not affect microbial community alpha diversity, while a total of 22 of 176 identified metabolites varied significantly across different sampling regions. In contrast, storage conditions significantly influenced the microbiome and metabolome. Short-term room temperature storage had a minimal effect on the microbiome and metabolome profiles. Sample storage in RNALater showed a significant level of variation in both microbiome and metabolome profiles, independent of the storage or retrieval conditions. The effect of RNALater on the metabolome was stronger than the effect on the microbiome, and individual variability between study participants outweighed the effect of RNALater on the microbiome. We conclude that homogenizing stool samples was critical for metabolomic analysis but not necessary for microbiome analysis. Short-term room temperature storage had a minimal effect on the microbiome and metabolome profiles and is recommended for short-term fecal sample storage. In addition, our study indicates that the use of RNALater as a storage medium of stool samples for microbial and metabolomic analyses is not recommended.IMPORTANCE The gastrointestinal microbiome and metabolome can provide a new angle to understand the development of health and disease. Stool samples are most frequently used for large-scale cohort studies. Standardized procedures for stool sample handling and storage can be a determining factor for performing microbiome or metabolome studies. In this study, we focused on the effects of stool sampling regions and stool sample storage conditions on variations in the gut microbiome composition and metabolome profile

Directory of Open Access Journals

eScholarship - University of California

Infection and Infertility

Author: Chen Yiqiu
Gu Hao
Sha Jiahao
Tang Qiuqin
Wang Xinru
Wu Wei
Xia Yankai
Publication venue: 'IntechOpen'
Publication date: 29/06/2016
Field of study

Infection is a multifactorial process, which can be induced by a virus, bacterium, or parasite. It may cause many diseases, including obesity, cancer, and infertility. In this chapter, we focus our attention on the association of infection and fertility alteration. Numerous studies have suggested that genetic polymorphisms influencing infection are associated with infertility. So we also review the genetic influence on infection and risk of infertility

IntechOpen